Appln. No. 10/678,551 
Amendment dated October 24, 2007 
Regarding Office Action dated August 28, 2007 
Docket No. 7463-20 (CE1 1418JUI) 

I. REMARKS/ARGUMENTS 

These remarks are submitted in response to the Office Action of August 28, 2007 
(Office Action). As this response is timely filed within the 3-month shortened statutory 
period, no fee is believed due. 

Applicants have amended independent Claims 1, 6, 11, 20 and 21 to emphasize 
certain aspects of Applicants' invention. Claim 5 has been cancelled. The amendments are 
supported throughout the Specification. No new matter has been introduced by virtue of the 
amendments. Further, this amendment should not require a further search or further 
consideration beyond the consideration previously given since the scope of the amended 
claims are believed to be within the scope of the previously considered claim 5 or other 
claims. 

In paragraph 2, page 2 of the Office Action, Claims 1-21 were rejected under 35 
U.S.C. 102(e) as being anticipated over Galanes et al. (U.S. Patent Application Publication 
No. 2003/0200080, hereafter Galanes) in view of Firman (U.S. Patent Application 
Publication No 2002/0010582, hereafter Firman ) and Ativanichayaphong et al. (U.S. Patent 
Application Publication No 2004/0236574, hereafter Ativanichayaphong). 

II. Applicant's Invention 

It may be helpful to reiterate certain aspects of Applicant's invention prior to 
addressing the references cited in the Office Action. One embodiment, for example, 
provides in a device, a method for unifying speech user interface and graphic user interface 
commands, comprising the steps of receiving grammar specifying a syntax of at least one 
speech command and having semantic information, processing the grammar to extract the 
semantic information for use with both a graphical user interface and a speech user 
interface, receiving a user input to select at least one component in the graphical user 
interface during a navigation of the graphical user interface, audibly receiving the at least 
one speech command associated with the at least one component from the speech user 
interface, generating semantic directives from user supplied input text, the user input, and 
the at least one speech command to update a recognition vocabulary for recognizing one or 
more non-unified components of the graphical user interface (see paragraph [0005]), 
parsing the grammar between the graphical user interface and the speech user interface in 
accordance with the semantic directives to unify the graphical user interface with the 
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grammar from the recognition vocabulary (see paragraph [0007]), visually presenting the at 
least one speech command in a visual prompt of the graphical user interface responsive to 
the step of audibly receiving, and building a corresponding speech command for a plurality 
of speech commands associated with a plurality of unified and non-unified components. The 
corresponding speech command is a single instance of a plurality of speech commands 
associated with a selection of one or more components of the graphical user interface 
during the navigation of the graphical user interface. 

III. The Claims Define Over the Prior Art 

As already noted, independent Claims 1, 6, 1 1, 16, 20, and 21 were rejected under 
35 U.S.C. 102(e) as being anticipated by Galanes in view of Firman and 
Ativanichayaphong. Applicant respectfully submit, however, that Galanes, Firman or 
Ativanichayaphong do not teach or suggest each feature recited in amended Claims 1 , 6, 
11, 16, 20 and 21. 

Firman is directed to a system for enabling voiced utterances to be substituted for 
manipulation of a pointing device. The pointing device can be manipulated to control motion 
of a cursor on a computer display and indicate desired actions associated with the position 
of the cursor on the display. In one embodiment, voiced utterances are converted to 
commands, expressed in a predefined command language, to be used by an operating 
system of a computer. Some voiced utterances are converted into commands 
corresponding to actions to be taken by the operating system. Other voiced utterances are 
converted into commands which carry associated text strings to be used as part of text 
being processed in an application program running under the operating system. 

Ativanichayaphong is directed to a method for enhancing voice interactions within a 
portable multimodal computing device using visual messages. The method can include 
providing a multimodal interface that includes an audio interface and a visual interface. A 
speech input can be received and a voice recognition task can be performed upon the 
speech input. At least one message within the multimodal interface can be visually 
presented, wherein the message is a prompt for the speech input and/or a confirmation of 
the speech input. 

Galanes is directed to controls for a web server that generates client side markup 
enabled with recognition and/or audible prompting. Controls commonly related to visual 
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rendering are extended to include attributes related to recognition and/or audible prompting. 
Typically, controls such as a "label" use a library having markup information, which provides 
a visual prompt on a display. Similarly, a "textbox" provides an input field on a visual 
display. In a first embodiment, an additional library is provided for recognition and/or audibly 
prompting, wherein the controls include attributes or parameters to use both libraries (See 
Galanes paragraph [0010]). 

Galanes also discloses an embodiment wherein a set of companion controls having 
attributes related to recognition and/or audible prompting are formed. The companion 
controls use a library having recognition and audibly prompting markup information. The 
companion controls are selectively associated with visual controls. In this manner, 
application logic can remain with the visual controls, wherein the companion controls 
provide recognized results to the visual controls. The companion controls follow a dialog in 
that controls are provided for prompting a question, obtaining an answer, confirming a 
result, providing a command, or making a statement (See paragraph [0012]). 

Although Galanes discloses extending recognition and audible prompting in a 
graphical user interface, and a set of companion controls for providing client-side mark-up 
information, Galanes does not teach generating semantic directives from user supplied 
input text, user input during navigation, and the at least one speech command to update a 
recognition vocabulary for recognizing one or more non-unified components of the graphical 
user interface. Further, Galanes does not teach parsing the grammar between the graphical 
user interface and the speech user interface in accordance with the semantic directives to 
unify the graphical user interface with the grammar from the recognition vocabulary as in 
the amended claims. 

More specifically, Galanes does not teach a system that during a navigation of a 
graphical user interface (GUI) receives grammar specifying a syntax of speech commands 
having semantic information, identifies user inputs (e.g. text input, mouse click) associated 
with a selection of components (e.g. buttons, lists, links) in the GUI, audibly receives a 
speech command (e.g. spoken utterance) associated with a selected component, 
associates the speech command with the components in view of the grammar to produce 
unified components, updates a speech recognizer with user entered text for recognizing one 
or more non-unified components of the graphical user interface from the semantic 
information, and builds a corresponding speech command that is a single instance of a 
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plurality of speech commands received in association with a selection of unified and non- 
unified components during the navigation of the GUI. 

For example, as shown in FIG. 3 of Applicant's drawings, a user can audibly say 
speech commands (e.g. spoken utterance) during a navigation of the GUI 300 while 
selecting GUI components. For instance, the user can select "switch to" on a menu bar of 
the GUI (subplot 1) using a mouse click or keystroke command, and then say "Switch to 
Lookup". The GUI captures the audible spoken command then switches to the Lookup 
dialogue (subplot 2) responsive to a selection of the Lookup component, and associates the 
spoken utterance "Switch to Lookup" with the plurality of (i.e. combination of) of selected 
components (i.e. "Switch too" + "Lookup"). In such regard a single instance for a spoken 
command can be created and visually presented on the GUI. Similarly, when the user 
audibly says "Lookup entry with" and selects the name component in the dialogue, the GUI 
associates the Lookup entry dialog with the "name" selection. The user (subplot 3) can then 
say "Lookup entry with name Mary Smith" responsive to selecting Mary in the dialogue. In 
such regard, the user builds a corresponding speech command during the navigation of the 
GUI. 

A unified component is a component that the user has already associated with a 
visually displayed graphical user interface component using text input, user input 
navigation, and speech commands. For instance, a unified component is generated when 
the user says and types in "Look-up" and then selects the LookUp component, or when the 
user says and types in "Mary Smith" and then selects the contact label for Mary Smith. A 
non-unified component is a component that user has not already been associated by user 
input text and speech commands with a visually displayed graphical user interface 
component. A non-unified component is a component that does not yet have a 
corresponding speech command with user input text and user input navigation. For 
example, if the user selects a contact label for the name "Greg Jones" but has not 
previously associated a speech command (e.g., "Greg Jones") or user input text with the 
selection of the contact label, then the contact label for "Greg Jones" is a non-unified 
component; that is, it has not been unified with the speech command, the user input text, 
and the graphical user interface component. Although the component is non-unified, the 
semantics engine by way of previous learning associated with unified components can 
navigate the user to these components in view of "unheard" speech commands. 
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As an example, a user can build a speech command for "Lookup entry with name 
Mary Smith" by generating semantic directives for the terms "Lookup entry" and the spoken 
name "Mary Smith". The semantic directives associate portions of the speech commands 
with the user input text. For example, the semantic engine can recognize letters from the 
user input text and associate these letters with portions of the spoken utterance. Applicants' 
invention builds semantic directives in view of the user supplied text, user input for 
navigation, and spoken utterances by processing the grammar to extract the semantic 
information for use with both a graphical user interface and a speech user interface. (See 
paragraph [0008], "... the processor can be programmed to update the graphical user 
interface by updating graphical user interface directives and graphical user elements to 
maintain the graphical user interface unified with the speech grammar of the speech user 
interface".) 

The semantic learning unifies the graphical user interface components with the 
speech commands in view of the user input text. This further allows the semantic engine to 
recognize a selection of non-unified components. As an example, although a user may not 
provide (e.g. does not type in the letters) user supplied input text for "Greg Jones", the user 
can say "Lookup entry Greg Jones" to access the non-unified component selection of the 
look up information for Greg Jones. Although the semantic engine never received user 
supplied input text for Greg Jones, it can still determine that the user is interested in a 
"LookUp" operation for a contact label associated the spoken command "Greg Jones". That 
is, the semantic engine can supplement navigation to non-unified components based on 
semantic learning of unified components. (See paragraph [0006], " ...augment and update a 
speech grammar and recognition vocabulary".) 

Nowhere does Galanes or Firman teach generating semantic directives from user 
supplied input text, user input during navigation, and speech commands to update a 
recognition vocabulary for recognizing one or more non-unified components of the graphical 
user interface, and parsing the grammar between the graphical user interface and the 
speech user interface in accordance with the semantic directives to unify the graphical user 
interface with the grammar from the recognition vocabulary as in amended claim 1. More 
specifically, neither Galanes nor Firman contemplate unified and non-unified components, 
nonetheless navigating a graphical user interface using user input supplied text in 
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combination with spoken commands during a navigation of the graphical user interface to 
generate semantic directives. 

In Paragraph 2 of Page 2 of the Office Action, it was stated that although Galanes 
does not disclose receiving speech during the navigation, Firman teaches receiving a 
grammar and audibly receiving a spoken command associated with the utterance. 
However, Firman nor Galanes teaches generating semantic directives from user supplied 
text. As shown in the cited portion below, Firman only contemplates converting spoken 
commands to text. Firman does not contemplate generating semantic directives from text, 
and building a corresponding speech command for a plurality of speech commands 
associated with a plurality of unified and non-unified components from the semantic 
directives generated from the text. 

[0005] In general, in another aspect of the invention, voiced utterances are 
converted to commands, expressed in a predefined command language, to be used 
by an operating system of a computer, converting some voiced utterances into 
commands corresponding to actions to be taken by said operating system, and 
converting other voiced utterances into commands which carry associated text 
strings to be used as part of text being processed in an application program running 
under the operating system. (Firman) 

As to the rejection of claims 6 and 21 , it was stated in the Office Action that Firman 
teaches the use of a system that unifies a speech user interface and a graphical user 
interface (abstract lines 1-8), receives a grammar specifying a syntax (paragraph [007], 
lines 1-6), and processes the grammar to extract semantic information (paragraph [006] 
lines 1-5). Applicants respectfully disagree. At the cited location, Firman only discloses 
"converting other voiced utterances into commands which carry associated text strings to 
be used as part of text being processed." This is not the same as generating semantic 
directives from user supplied input text to update a recognition vocabulary as in amended 
claim 1 . Semantic directives determine the meaning of the text. As an example, a semantic 
directive in the phrase "look up name Mary Smith" is a directive to access a contact list and 
find Mary Smith in the list of names. The semantic directive can be a navigation menu 
hierarchy, for instance, that selects a component for contact list followed by selecting a 
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component for names. Nowhere, does Firman contemplate a semantics engine that uses 
user input text with speech commands to generate unified components, and then augments 
the semantic directives to select non-unified components when user input text is not 
provided. Firman only discloses that received speech commands can be converted to text 
for use in an application. 

In Applicants' invention, a user can submit user input text and audibly present 
spoken utterances to unify components in a graphical user interface with commands of a 
speech user interface. Further, the semantics engine can augment the semantic meaning of 
the user input text with "unheard" spoken utterances to navigate through a graphical user 
interface of non-unified components. In such regard, the user by way of user input text and 
speech commands can update a speech recognizer for semantic meanings of spoken 
utterances to understand the "unheard" words. Nowhere does Galanes or Firman 
contemplate a semantics engine that can interpret spoken commands for selecting non- 
unified components from previously spoken commands that are unified with a user input 
text for previously selected components. 

As to claim 16, it was stated in the Office Action that Firman teaches a system that 
unifies a speech user interface and a graphical user interface (abstract lines 1-8), receives a 
grammar specifying a syntax (paragraph [007], lines 1-6), and processes the grammar to 
extract semantic information (paragraph [006] lines 1-5). Applicants respectfully disagree 
for the same reasons above. It was further stated that Ativanichayaphong teaches a system 
that unifies a speech user interface and a graphical user interface (abstract lines 1-5) that 
allows for the user to enter an audio input associated with the object selected for 
association and visually presenting the speech command in a visual prompt (paragraph 
[008] lines 11-15). However, Ativanichayaphong is not directed to retrieving semantic 
meaning or directives from user input text associated with spoken utterances. 
Ativanichayaphong merely enhances voice interactions using visual messages. More 
specifically, visual prompts are provided to confirm speech command requests. 

Ativanichayaphong does not teach generating semantic directives from the user 
entered text to update a recognition vocabulary for recognizing one or more non-unified 
components of the graphical user interface as in amended claim 16. In particular, 
Ativanichayaphong, nor Galanes, contemplates unifying graphical user interface 
components with speech user interface commands using user input text (semantic 
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meaning), spoken commands (speech content), and navigation through the graphical user 
interface, followed by using the semantic directives to navigate through a graphical user 
interface of non-unified components. Nowhere does Ativanichayaphong discuss a semantic 
engine to extract semantic directives from user input text in view of spoken utterances. 

As to claim 20, it was stated in the Office Action that Firman teaches receiving a user 
input to select at least one component in the graphical user interface (abstract lines 1-8), 
receiving a grammar specifying syntax of a command (paragraph [0007] lines 1-6) and 
processing the grammar to extract semantic information (paragraph [0006] lines 1-5). A 
portion of the cited art is presented below. 

[0007] In general, in another aspect, the invention features enabling a user to create 
an instance in a formal language of the kind which has a strictly defined syntax; a 
graphically displayed list of entries are expressed in a natural language and do not 
comply with the syntax, the user is permitted to point to an entry on the list, and the 
instance corresponding to the identified entry in the list is automatically generated in 
response to the pointing. (Firman) 

The portion cited above only indicates that a user can select an entry from a list that 
is in a natural language that does not comply with a strictly defined syntax. In no way 
however does Firman suggest that selecting an entry from the list results in generating 
semantic directives from the user entered text to update a recognition vocabulary for 
recognizing one or more non-unified components of the graphical user interface as in 
amended claim 20. In Firman, the selecting of the entry serves as a substitute for a strictly 
defined syntax. Applicants respectfully assert that the term syntax is different from the term 
semantic. Syntax establishes the rules for the formation of sentences. Semantic is the 
meaning of words in the sentences. Nowhere does Firman contemplate extracting semantic 
meaning from a user input text and associating the semantic meaning with a portion of a 
spoken utterance to learn how a user navigates through components in a graphical user 
interface. The aspect of unifying components in the graphical user interface with spoken 
commands of a speech user interface permits the semantics engine to navigate the user to 
non-unified graphical user interface components. 

Accordingly, the combination of Galanes, Firman, Ativanichayaphong and any 
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combination thereof fails to teach each and every feature of independent claims 1, 6, 11, 
16, 20, and 21 as amended. Applicants, therefore, respectfully maintain that amended 
independent Claims 1, 6, 11, 16, 20, and 21 define over the prior art. Applicants further 
respectfully submit that whereas the remaining dependent claims each depend from one of 
the amended independent claims while reciting additional features, the dependent claims 
likewise define over the prior art. 

IV. CONCLUSION 

Applicants believe that this application is now in full condition for allowance. 
Allowance is therefore respectfully requested. Applicants request that the Examiner call the 
undersigned if clarification is needed on any matter within this Amendment, or if the 
Examiner believes a telephone interview would expedite the prosecution of the subject 
application to completion. 

Respectfully submitted, 



Date: October 24, 2007 /Pablo Meles/ 

Pablo Meles, Reg. No. 33,739 

Marc A. Boillot, Reg. No. 56,164 
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Customer No. 55794 

Post Office Box 3188 
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